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Abstract 



Ph ■ There are various methods to analyze different kinds of data sets. Spatial 

■^ I data is defined when data is dependent on each other based on their respective 

locations. Spline and Kriging are two methods for interpolating and predicting 

spatial data. Under certain conditions, these methods are equivalent, but in 

practice they show different behaviors. 

Amount of data can be observed only at some positions that are chosen 
^ ■ as positions of sample points, therefore, prediction of data values in other 

C^ , positions is important. In this paper, the link between Spline and Kriging 

methods is described, then for an epidemiological two dimensional real data 
t:::^ ' set, data is observed in geological longitude and in latitude dimensions, and 

behavior of these methods are investigated. Comparison of these performances 

show that for this data set, Kriging method has a better performance than 

Spline method. 
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c^ : 1 Introduction 

To analyze every kind of data, a model of data structure can be considered. In 
spatial data analysis, a random field {Z(t),t G -D C W^} is applied for spatial data 
modeling, where t is the site of desired location and D is an index set. For each t, 
the random field Z{t) can be decomposed as 

Zit)=fiit) + 6it) (1) 

where /i(t) is the trend of random field and 6{t) is a zero mean random field. In 
spatial statistics, there are many methods for predicting the value of random field at 
a given spatial site, say tg, using observations Z = {Z{ti), . . . , Z{tn))' of the random 
field Z{.) at n spatial sites t = (ti, . . . ,tn)- One of the methods, named Kriging, is 
the best unbiased predictor which has different kinds such as Ordinary and Universal 
Krigings. In ordinary kriging, the trend term in relation ([1]) is fixed and in universal 
kriging fi{t) is a function of t (cressie (1993)). 
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Spline is another method for spatial data prediction which minimizes penalized 
sum of squares criterion. For more details about Splines can refer to Green and 
Silverman (1994), Hart (2005) and Hardle (2006). 

Some authors studied the link between Spline and Kriging, as two prediction 
methods. Theoretical link between these methods is studied by Kent and Mardia 
(1994) and the applied link for some data sets is investigated by Hutchinson and 
Gessler (1994) and Lasslet (1994). In this paper these methods are applied for 
predicting values of data with two dimensional positions. This data set relates to 
taberculosis infection prevalence in some cities of Iran which observed in geological 
longitude and in latitude dimensions. A brief review of Kriging and Spline methods 
are given respectively in sections 2 and 3, and in section 4 these methods are applied 
for predicting rate of taberculosis infection prevalence, and performances of the 
methods are compared. Finally results and conclusions are given in section 5 and 
the better method to predict values of taberculosis infection prevalence is determined 
based on the data. 



2 Kriging 

In Universal Kriging, the trend term in relation ([T]) is an unknown linear combination 
of known functions fj{.) with unknown coefficients (3j, that is 

fi{t) = Sf+i/3,_i/,_i(t) 

where /3 = {(3o, ■■■,Pp)' ^ RP^^, is an unknown vector of parameters. Furthermore, 
data Z can be written as 

Z = X(3 + 6 

where X is an n x (p + 1) matrix whose (i,j)th element is /j_i(tj). 
It is desired to predict Z{tQ) linearly from data Z, that is 

Z{to) = A'Z, X'X = x' (2) 

which is uniformly unbiased {E[Z{tQ)] = E[Z{tQ)]), and minimizes the mean squares 
error term a^ = E[{Z{to) - Z{to)f] over A = (Ai, ..., A„). 

Assumption X'X = x' in equation ([2]) is equivalent to uniformly unbiased con- 
dition, where x = (/o(^o)) •••) /n(^o))'- Then the optimal value of A in relation ([2]) 
is 

X' = [C + X{X'^-'X)-\x - X'S-^C)]' S-^ (3) 

where C = (c(to — ^i), •••, c(to — t„))' and S is an n x n matrix with {i,j)th element 
c{ti — tj). The Kriging variance can be written as 

a'(to) = c(o) - c'^-'c + {x~ x'j:-^cy{x'i:-'x)-\x - X'S-^C) (4) 

When p = and fo{t) = 1, universal kriging reduce to ordinary kriging. 



In universal kriging, the optimal value of A (equation ([3])) can be written as 
A(7 = S^^Cf/ where Xu = (Ai, ..., A„, —niQ, ..., —nipY and rriiS are lagrange multipliers 
that insure A'X = x' and Cu = (c(to — ^i), •••c(to — tn), 1, fi(to), ..., /p(to))'- Then 
kriging predictor at to is 



Z(to) — Z ulljj Cu — VjjCi 



U'-'U 



(5) 



where Vu = Y^j^Zy , Z{/ = {Z{t\), ..., Z{tn), 0, ...,0)' which is an (n+p+l) x 1 vector. 
In equation ([5]) by writing V{j = (y{, V2) so that 14 is n x 1 and V2 is (p + 1) x 1, 



then Vjj 



S(7 Zu 



and dual kriging equations is obtained as 
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SI/1 + X\/2 = Z 

X'Vi = 



(6) 



By solving this system and replacing in relation ([2]), predictor of Z{to) can be written 



as 



Z{to) = VIC + I/2X 



3 Spline 

Data Z of random field Z(.) is given at locations {tj G -D C R^jd > 1}. Consider 
the problem of estimating unknown function g in the model 

Zi = g(ti) + ei, i = l,...,n (7) 

To fit g properly, penalized sum of squares criterion is defined as 

S{9, A) = SILi(Z, - g{t,)f + aJf^M (8) 

where a > is smoothing parameter. A function g which minimizes penalized sum 
of squares criterion is called Spline. The second term in equation ([8]) is 



Jr+M 



\V'+'g{t)\^dt 
V + 1 



J|m|=r+1 



m 



d^+^g{t) 



•9t[l]™W,...,9t[rf]™['^l- 



^dt 



where V^"*" is (r + l)-fold iterated gradient oi g, t = (t[l], . . . , t[d]), 



Jr+lid) — Y\m\=r+1 



r + 1 



d^+'g{t) 



'dt 



where m = (wi[l], . . . , mid]) and \m\ = m[l] + ■ ■ ■ + m[d]. 

For d = 2, a function g which minimizes penalized sum of squares (|H]) is called 
Thin Plate spline. To determine a proper value of a can refer to Gu (2002), Hart 
(2005) and Hardle (2006). 



Now finding dual equations for spline in case d=2 is considered (because dimen- 
sion of our data is d=2.) Smoothing Spline of degree 2 is 

Z{to) = ao + aixo + a2l/o + S"=i&ie(to - U) (9) 

where 

e(h) = ||h||^log(||h||^)/167r 

In relation ([9]), a = (gq, oi, 02)' and b = (61, ... , 6„)' solve 

X'h = ^ ' 

where K^ = K + nal is an n x n matrix with (z, j)th elementary e(tj — tj), X is an 
n X 3 matrix with ith. row (1, x,, y,), tj = (xj, ?/i)' and < a < 00 is the smoothing 
parameter. 

4 Application of Spline and Kriging to Prediction 



Dual equations (El) and (ITOl) show that the form of these equations for universal 
kriging and spline are the same, just generalized covariance in Spline is used instead 
of covariogram. In kriging method, when the second order stationary condition 
does not satisfied or anyway the IRFk's is used, generalized covariances are applied. 
Therefore dual equations of kriging and sphne methods are equal. Consequently 
methods of kriging and spline are similar (theoretically), but they can be different 
practically. In the next section these two methods are compared in an epidemiolog- 
ical problem. 

4.1 Data Set and Practical Comparison 

Here data of taberculosis infection prevalence in the cities of Iran on the year 1999 
is considered. The random field is nonstationary and data has a trend, therefore 
data is detrended by median polishing. To estimate covariogram. Classic estimator 
is applied and Gaussian model is chosen as the best model of covariogram for this 
data set. 

To compare the methods performances, a criterion should be considered. Cross 
validation is a popular means of assessing statistical estimation and prediction. If 
the variogram model described adequately spatial dependencies implicit in data set, 
then predicted value Zit^) should be close to the true value Zito). Ideally additional 
observations on Z{.) to check this, or initially some of the data might set aside to 
validate spatial predictor. More likely, all of the data are used to fit the variogram, 
build the spatial predictor, and there is no possibility of taking more observations. 
In this case the cross validation approach can be used. Let 27(/i, 9) be the fitted 
variogram model (obtained from the data); now delete a datum Z{tj) and predict it 
with Z_j(tj) [based on 2j{h,9) variogram estimator and the data Z without Z(tj)]. 



Its associated mean - square prediction error is u'^Atj) which depends on the fitted 
variogram model. 

The closeness of prediction values to the true values can be characterized as the 
standardized Mean Square error of Prediction 

MSP = [l/n(SIL,^MzZ_zM)Y/l 

In this paper, spline and kriging methods is compared by this criterion and the 
better method which has smaller MSP is determined. For this data set, gaussian 
model with nugget effect equal to 39.8 is the best covariogram model to kriging 
prediction. In spline method the smoothing parameter should be determined and 
for this data set, the best value which minimizes penalized sum of squares criterion 
equals a = 208.6601. 

Cross validation criterion is applied to compare the methods. Programs for 
computations is written in R and SPLUS environments for the two dimensional 
data set. Cross validation criterion in kriging method is equal to 0.0239 and in 
spline method, it is equal to 0.0461. 

Consequently, kriging method has a better performance than spline for this data 
set. This result can be reasonable because in spline usually a special generalized 
covariance function is used but in kriging this function is characterized based on the 
data. Therefore for some data sets, kriging method could have better performance 
than spline. 



5 Conclusion 

Under certain conditions kriging and spline methods are equivalent, but in prac- 
tice there are differences between these methods. For instance in spline usually a 
particular generalized covariance function is used but in kriging, this function is 
determined based on data, therefore it is expected that kriging has a better per- 
formance in some situations. In this paper these methods are applied to predict 
rate of taberculosis infection prevalence which is a noticeable problem in medicine. 
The data has measured at two dimensional sites and computations are carried out 
in R and SPLUS environments. For the data set, computations show that kriging 
method has a better performance than spline. Consequently application of Kriging 
can be a preferable method of prediction. 
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